Entity Disambiguation based on a Probabilistic Taxonomy
نویسندگان
چکیده
This paper presents a method for entity disambiguation, one of the most substantial tasks for machines to understand text in natural languages. In a natural language, terms have ambiguity, e.g. “Barcelona” usually means a Spanish city but it can also refer to a professional football club. In our work, we utilize a probabilistic taxonomy that is as rich as our mental world in terms of the concepts of worldly facts it contains. We then employ a naive Bayes probabilistic model to disambiguate a term by identifying its related terms in the same document. Specifically, our method consists of two steps: clustering related terms and conceptualizing the cluster using the probabilistic taxonomy. We cluster related terms probabilistically instead of using any threshold-based deterministic clustering approach. Our method automatically adjusts the relevance weight between two terms by taking the topic of the document into consideration. This enables us to perform clustering without using a sensitive, predefined threshold. Then, we conceptualize all possible clusters using the probabilistic taxonomy, and we aggregate the probabilities of each concept to find the most likely one. Experimental results show that our method outperforms threshold-based methods with optimally set thresholds as well as several gold standard approaches for entity disambiguation.
منابع مشابه
Combining Textual and Graph-Based Features for Named Entity Disambiguation Using Undirected Probabilistic Graphical Models
Named Entity Disambiguation (NED) is the task of disambiguating named entities in a natural language text by linking them to their corresponding entities in a knowledge base such as DBpedia, which are already recognized. It is an important step in transforming unstructured text into structured knowledge. Previous work on this task has proven a strong impact of graph-based methods such as PageRa...
متن کاملHYENA: Hierarchical Type Classification for Entity Names
Inferring lexical type labels for entity mentions in texts is an important asset for NLP tasks like semantic role labeling and named entity disambiguation (NED). Prior work has focused on flat and relatively small type systems where most entities belong to exactly one type. This paper addresses very fine-grained types organized in a hierarchical taxonomy, with several hundreds of types at diffe...
متن کاملDeep Joint Entity Disambiguation with Local Neural Attention
We propose a novel deep learning model for joint document-level entity disambiguation, which leverages learned neural representations. Key components are entity embeddings, a neural attention mechanism over local context windows, and a differentiable joint inference stage for disambiguation. Our approach thereby combines benefits of deep learning with more traditional approaches such as graphic...
متن کاملUnsupervised Concept Hierarchy Induction: Learning the Semantics of Words
Unsupervised concept hierarchy induction, or taxonomy learning, is the task of hierarchically classifying word senses in order to develop a taxonomy of concepts. Taxonomies of concepts such as the one found in WordNet (Fellbaum, 1998) are important resources for a variety of Natural Language Processing (NLP) including word sense disambiguation (Ramakrishnan et al., 2004; Navigli & Velardi, 2004...
متن کاملCombining Mention Context and Hyperlinks from Wikipedia for Named Entity Disambiguation
Named entity disambiguation is the task of linking entity mentions to their intended referent, as represented in a Knowledge Base, usually derived from Wikipedia. In this paper, we combine local mention context and global hyperlink structure from Wikipedia in a probabilistic framework. Our results show that the two models of context, namely, words in the context and hyperlink pathways to other ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011